Model Selection

GRPO reinforcement learning

# GRPO reinforcement learning

Reasongen R1 SFT

ReasonGen-R1 is a text-to-image model trained on image prompts and reasoning basis datasets through supervised fine-tuning (SFT), with the explicit 'thinking' ability based on text.

Gazal R1 32B GRPO Preview

Gazal - R1 - 32B is a language model specifically designed for medical reasoning and clinical decision - making. It is built on Qwen 3 32B and demonstrates excellent performance in the professional medical field.

Large Language Model

Seg Zero 7B Best On ReasonSegTest

Seg-Zero-7B is an image segmentation model based on reasoning chain guidance, featuring a decoupled architecture that includes a reasoning model and a segmentation model. It achieves zero-shot generalization capabilities through GRPO reinforcement learning training.

Image Segmentation

Transformers English

Qwen2.5 0.5B Instruct Gensyn Swarm Peaceful Exotic Butterfly

A fine-tuned version based on Gensyn/Qwen2.5-0.5B-Instruct, trained using the TRL framework and GRPO algorithm, suitable for instruction-following tasks.

Large Language Model

Captain Eris Violet GRPO V0.420

Captain-Eris_Violet is an advanced language model developed through multi-stage supervised fine-tuning, QLoRA adapters, and GRPO-optimized RLHF, suitable for role-playing and dialogue generation.

Large Language Model

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase